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Abstract 

Privacy  policies  in  sectors  as  diverse  as  Web  services,  finance  and  healthcare  often  place  restrictions 
on  the  purposes  for  which  a  governed  entity  may  use  personal  information.  Thus,  automated  methods 
for  enforcing  privacy  policies  require  a  semantics  of  purpose  restrictions  to  determine  whether  a  governed 
agent  used  information  for  a  purpose.  We  provide  such  a  semantics  using  a  formalism  based  on  planning. 
We  model  planning  using  Partially  Observable  Markov  Decision  Processes  (POMDPs),  which  supports 
an  explicit  model  of  information.  We  argue  that  information  use  is  for  a  purpose  if  and  only  if  the 
information  is  used  while  planning  to  optimize  the  satisfaction  of  that  purpose  under  the  POMDP 
model.  We  determine  information  use  by  simulating  ignorance  of  the  information  prohibited  by  the 
purpose  restriction,  which  we  relate  to  noninterference.  We  use  this  semantics  to  develop  a  sound  audit 
algorithm  to  automate  the  enforcement  of  purpose  restrictions. 


1  Introduction 

Purpose  is  a  key  concept  for  privacy  policies.  Some  policies  limit  the  use  of  certain  information  to  an  explicit 
list  of  purposes.  The  privacy  policy  of  The  Bank  of  America  states,  “Employees  are  authorized  to  access 
Customer  Information  for  business  purposes  only.”  [5].  The  HIPAA  Privacy  Rule  requires  that  healthcare 
providers  in  the  U.S.  use  protected  health  information  about  a  patient  with  that  patient’s  authorization 
or  only  for  a  fixed  list  of  allowed  purposes,  such  as  treatment  and  billing  [30].  Other  policies  prohibit 
using  certain  information  for  a  purpose.  For  example,  Yahoo! ’s  privacy  policy  states  “Yahoo! ’s  practice  on 
Yahoo!  Mail  Classic  is  not  to  use  the  content  of  messages  stored  in  your  Yahoo!  Mail  account  for  marketing 
purposes.”  [47]. 

Each  of  these  examples  presents  a  constraint  on  the  purposes  for  which  the  organization  may  use  infor¬ 
mation.  We  call  these  constraints  purpose  restrictions. 

Let  us  consider  a  purpose  restriction  in  detail.  As  a  simplification  of  the  Yahoo!  example,  consider  an 
advertising  network  attempting  to  determine  which  advertisement  to  show  for  marketing  to  a  visitor  of  a 
website  (such  as  an  email  website).  To  improve  its  public  image  and  to  satisfy  government  regulations,  the 
network  adopts  a  privacy  policy  containing  a  restriction  prohibiting  the  use  of  the  visitor’s  gender  for  the 
purpose  of  marketing. 

The  network  has  access  to  a  database  of  information  about  potential  visitors,  which  includes  their  gender. 
Since  some  advertisements  are  more  effective,  on  average,  for  some  demographics  than  others,  using  this 
information  is  in  the  network’s  interest.  However,  the  purpose  restriction  prohibits  the  use  of  gender  for 

*This  research  was  supported  by  the  U.S.  Army  Research  Office  grants  DAAD19-02-1-0389  and  W911NF-09-1-0273  to  CyLab, 
by  the  National  Science  Foundation  (NSF)  grants  CCF0424422  and  CNS1064688,  and  by  the  U.S.  Department  of  Health  and 
Human  Services  grant  HHS  90TR0003/01.  The  views  and  conclusions  contained  in  this  document  are  those  of  the  authors  and 
should  not  be  interpreted  as  representing  the  official  policies,  either  expressed  or  implied,  of  any  sponsoring  institution,  the 
U.S.  government  or  any  other  entity. 
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selecting  advertisements  since  it  is  a  form  of  marketing.  Since  tension  exists  between  selecting  the  most 
effective  ad  and  obeying  the  purpose  restriction,  internal  compliance  officers  and  government  regulators 
should  audit  the  network  to  determine  whether  it  has  complied  with  the  privacy  policy. 

However,  the  auditors  may  find  manually  auditing  the  network  difficult  and  error  prone  leading  them  to 
desire  automated  tools  to  aid  them.  Indeed,  the  difficulty  of  manually  auditing  purpose  restrictions  has  led 
to  commercial  software  for  this  task  (e.g.,  [14]).  However,  their  approaches  have  been  ad  hoc. 

Our  goal  is  to  place  purpose  restrictions  governing  information  use  on  a  formal  footing  and  to  automate 
their  enforcement.  In  the  above  example,  intuitively,  the  auditor  must  determine  what  information  the 
network  used  while  planning  which  ads  to  show  to  a  user.  In  general,  determining  whether  the  purpose 
restriction  was  obeyed  involves  determining  facts  about  how  the  audited  agent  (a  person,  organization,  or 
computer  system)  planned  its  actions.  In  particular,  philosophical  inquiry  [41]  and  an  empirical  study  [42] 
show  that  the  behavior  of  an  audited  agent  is  for  a  purpose  when  the  agent  chooses  that  behavior  while 
planning  to  satisfy  the  purpose.  Our  prior  work  has  used  a  formal  model  of  planning  to  automate  the 
auditing  of  purpose  restrictions  that  limit  visible  actions  to  certain  purposes  [42]. 

We  build  upon  that  work  to  provide  formal  semantics  and  algorithms  for  purpose  restrictions  limiting 
information  uses,  whose  occurrence  the  auditor  cannot  directly  observe.  For  example,  while  the  ad  network 
is  prohibited  from  using  the  visitor’s  gender,  it  may  access  the  database  to  use  other  information  even  if  the 
database  returns  the  gender  as  part  of  a  larger  record.  Thus,  our  model  must  elucidate  whether  the  network 
used  the  gender  component  of  the  accessed  information. 

To  provide  auditing  algorithms,  we  need  a  formal  model  of  planning.  Fortunately,  research  in  artificial 
intelligence  has  provided  a  variety  of  formal  models  of  planning.  To  select  an  appropriate  model  for  auditing, 
we  examine  the  key  features  of  our  motivating  example  of  the  ad  network.  First,  it  shows  that  purposes  are 
not  just  goals  to  be  achieved  since  the  purpose  of  marketing  is  quantitative:  marketing  can  be  satisfied  to 
varying  degrees  and  more  can  always  be  done.  Second,  the  example  shows  that  outcomes  can  be  probabilistic 
since  the  network  does  not  know  what  ad  will  be  best  for  each  visitor  but  does  have  statistical  information 
about  various  demographics.  Lastly,  the  policy  is  governing  the  use  of  information.  Thus,  our  model  needs 
an  explicit  model  of  information. 

The  first  two  features  suggest  using  Markov  Decision  Processes  (MDPs),  which  we  have  successfully 
used  in  an  auditing  algorithm  for  purpose  restrictions  on  observable  actions  [42].  However,  needing  an 
explicit  model  of  information  requires  us  to  use  an  extension  of  MDPs,  Partially  Observable  Markov  Decision 
Processes  (POMDPs),  which  make  the  ability  of  the  planning  agent  to  observe  its  environment  and  collect 
information  explicit.  We  use  a  POMDP  to  model  the  agent’s  environment  where  the  purpose  in  question 
defines  the  reward  function  of  the  POMDP.  The  explicitness  of  observations  (inputs)  in  the  POMDP  model 
allows  us  to  go  beyond  standard  research  on  planning  to  provide  a  semantics  of  information  use  by  considering 
how  the  agent  would  plan  if  some  observations  were  conflated  to  ignore  information  of  interest. 

In  more  detail,  we  quotient  the  POMDP’s  space  of  observations  to  express  information  use.  Intuitively, 
to  use  information  is  to  see  a  distinction,  and  to  not  use  information  corresponds  to  ignoring  this  distinction. 
Thus,  we  quotient  by  an  equivalence  relation  that  treats  two  observations  as  indistinguishable  if  they  differ 
only  by  information  whose  use  is  prohibited  by  a  purpose  restriction.  For  example,  the  ad  network  promising 
not  to  use  gender  should  quotient  its  observations  by  an  equivalence  relation  that  treats  the  genders  as 
equivalent.  By  conflating  observations  that  differ  only  by  gender,  the  network  will  ignore  gender,  simulating 
ignorance  of  it.  Such  quotienting  is  defined  for  POMDPs  since  observations  probabilistically  constrain  the 
space  of  possible  current  states  of  the  agent’s  environment,  and  quotienting  just  decreases  the  constraint’s 
accuracy. 

We  use  our  quotienting  operation  to  provide  two  different  definitions  of  what  it  means  for  an  agent  to 
obey  a  purpose  restriction  involving  information  use.  The  first  requires  that  the  agent  uses  the  quotiented 
POMDP  to  select  its  behavior.  We  call  this  definition  cognitive  since  it  refers  to  the  agent’s  cognitive 
process  of  selecting  behavior.  Since  the  auditor  cannot  examine  the  agent’s  cognitive  processes  and  might 
only  care  about  their  external  consequences,  we  offer  a  second  weaker  definition  that  depends  upon  the 
agent’s  observable  behavior.  The  behaviorist  definition  only  requires  that  the  agent’s  behaviors  be  consistent 
with  using  the  quotiented  POMDP.  It  does  not  depend  upon  whether  the  agent  actually  used  that  POMDP 
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or  a  different  process  to  select  its  behavior. 

We  use  the  behaviorist  definition  as  the  basis  of  an  auditing  algorithm  that  compares  the  behaviors 
of  an  agent  to  each  of  the  behaviors  that  is  acceptable  under  our  notion  of  simulated  ignorance.  Despite 
comparing  to  multiple  behaviors,  our  algorithm  only  needs  to  optimize  the  quotiented  POMDP  once.  For  the 
behaviorist  definition,  we  prove  that  the  algorithm  is  sound  (Theorem  1)  and  is  complete  when  the  POMDP 
can  be  optimized  exactly  (Theorem  2). 

To  show  that  our  semantics  is  not  too  weak,  we  compare  it  to  noninterference,  a  formalization  of  infor¬ 
mation  use  for  automata  found  in  prior  security  research  [15].  This  definition  examines  how  an  input  to  an 
automaton  affects  the  automaton’s  output.  Our  approach  is  similar  but  uses  POMDPs  instead  of  automata. 
We  relate  the  two  models  by  defining  how  an  automaton  can  implement  a  strategy  for  a  quotiented  POMDP, 
which  allows  us  to  prove  that  the  cognitive  definition  implies  a  form  of  noninterference  (Theorem  3).  On 
the  other  hand,  we  show  that  an  agent  can  obey  the  behaviorist  definition  while  still  exhibiting  interference. 
However,  interestingly,  such  interference  cannot  actually  further  the  restricted  purpose  showing  that  the 
behaviorist  definition  is  still  strong  enough  to  prevent  interference  for  that  purpose. 

Since  an  action’s  purpose  can  depend  upon  how  it  fits  into  a  chain  of  actions,  we  focus  on  post-hoc 
auditing.  Nevertheless,  other  enforcement  mechanisms  can  employ  our  semantics.  Despite  focusing  on 
privacy  policies,  our  semantics  and  algorithm  may  aid  the  enforcement  of  other  policies  restricting  the  use 
of  information  to  only  certain  purposes,  such  as  those  governing  intellectual  property. 

Contributions  and  Outline.  We  start  by  reviewing  related  work  and  POMDPs  (Sections  2  and  3). 
Our  first  contribution  is  definitional:  we  use  our  quotienting  characterization  of  information  use  to  provide 
both  the  cognitive  and  behaviorist  definitions  of  complying  with  a  purpose  restriction  on  information  use 
(Section  4).  Our  second  contribution  is  our  auditing  algorithm  accompanied  by  theorems  of  soundness 
and  a  qualified  form  of  completeness  (Section  5).  Our  final  contribution  is  relating  our  formalization  to 
noninterference  with  a  theorem  showing  that  the  cognitive  definition  implies  noninterference  (Sections  6). 
We  end  with  conclusions  (Sections  7). 

2  Prior  Work 

Our  work  builds  upon  three  strands  of  prior  work:  information  flow  analysis,  enforcing  purpose  restrictions, 
and  planning. 

Information  Flow  Analysis.  Research  on  information  flow  analysis  led  to  noninterference  [15],  a  for¬ 
malization  of  information  flow,  or  use.  However,  prior  methods  of  detecting  noninterference  have  typically 
required  access  to  the  program  running  the  system  in  question.  These  analyses  either  used  the  program 
for  directly  analyzing  its  code  (see  [37]  for  a  survey),  for  running  an  instrumented  version  of  the  system 
(e.g.,  [44,  28,  45,  24]),  or  for  simulating  multiple  executions  of  the  system  (e.g.,  [48,  10,  12]).  Traditionally, 
the  requirement  of  access  to  the  program  has  not  been  problematic  since  the  analysis  has  been  motivated  as 
a  tool  for  software  engineers  securing  a  program  that  they  have  designed. 

However,  in  our  setting  of  enforcing  purpose  restrictions,  such  access  is  not  always  possible  since  the 
analyzed  system  can  be  a  person  who  could  be  adversarial  and  whose  behavior  the  auditor  can  only  observe. 
On  the  other  hand,  the  auditor  has  information  about  the  purposes  that  the  system  should  be  pursuing. 
Since  the  system  is  a  purpose-driven  agent,  the  auditor  can  understand  its  behavior  in  terms  of  a  POMDP 
model  of  its  environment.  Thus,  while  prior  work  provides  a  definition  of  information  use,  it  does  not  provide 
appropriate  models  or  methods  for  determining  whether  it  occurs  in  our  setting. 

Enforcing  Purpose  Restrictions.  Most  prior  work  on  using  formal  methods  for  enforcing  purpose  re¬ 
strictions  has  focused  on  when  observable  actions  achieve  a  purpose  [1,  8,  2,  9,  32,  18,  29,  13].  That  is,  they 
define  an  action  as  being  for  a  purpose  if  that  action  (possibly  as  part  of  a  chain  of  actions)  results  in  that 
purpose  being  achieved.  Our  work  differs  from  these  works  in  two  ways. 
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First,  we  define  an  action  as  being  for  a  purpose  when  that  action  is  part  of  a  plan  for  maximizing  the 
satisfaction  of  that  purpose.  Our  definition  differs  by  treating  purposes  as  rewards  that  can  be  satisfied 
to  varying  degrees  and  by  focusing  on  the  plans  rather  than  outcomes,  which  allows  an  action  to  be  for  a 
purpose  even  if  it  probabilistically  fails  to  improve  it.  The  semantics  of  purpose  we  use  follows  from  informal 
philosophical  inquiry  [41]  and  prior  work  using  Markov  Decision  Processes  to  formalize  purpose  restrictions 
for  actions  [42].  Jafari  et  al.  offer  an  alternative  view  of  planning  and  purposes  in  which  a  purpose  is  high- 
level  action  related  to  low-level  actions  by  a  plan  [17].  Our  views  are  complementary  in  that  theirs  picks  up 
where  ours  leaves  off:  Our  model  of  planning  can  justify  the  plans  that  their  model  accepts  as  given  while 
their  model  allows  for  reasoning  about  the  relationships  among  purposes  with  a  logic. 

Second,  we  consider  information  use.  While  the  aforementioned  works  address  restrictions  on  information 
access ,  they  do  not  have  a  model  of  information  use ,  such  as  noninterference  [15].  Hayati  and  Abadi  provide 
a  type  system  for  tracking  information  flow  in  programs  with  purpose  restrictions  in  mind  [16].  However, 
their  work  presupposes  that  the  programmer  can  determine  the  purpose  of  a  function  and  provides  no  formal 
guidance  for  making  this  determination. 

Minimal  disclosure  requires  that  the  amount  of  information  used  in  granting  a  request  for  access  should  be 
as  little  as  possible  while  still  achieving  the  purpose  behind  the  request.  This  is  closely  related  to  enforcing 
purpose  restrictions.  However,  purpose  restrictions  do  not  require  the  amount  of  information  used  to  be 
minimal  and  often  involve  purposes  that  are  never  fully  achieved  (e.g.,  more  marketing  is  always  possible). 
Unlike  works  on  minimal  disclosure  [22,  6]  that  model  purposes  as  conditions  that  are  either  satisfied  or  not, 
we  model  them  as  being  satisfied  to  varying  degrees.  Furthermore,  we  model  probabilistic  factors  absent  in 
these  works  that  can  lead  to  an  agent’s  plan  failing.  Modeling  the  such  failures  allows  us  to  identify  when 
information  use  is  for  a  purpose  despite  not  increasing  the  purpose’s  satisfaction  due  to  issues  outside  of  the 
agent’s  control. 

Planning.  Since  our  formal  definition  is  in  terms  of  planning,  automating  auditing  depends  upon  au¬ 
tomated  plan  recognition  [38].  We  build  upon  works  that  use  models  of  planning  to  recognize  plans 
(e.g.,  [4,  3,  34,  35]).  The  most  related  work  has  provided  methods  of  determining  when  a  sequence  of 
actions  are  for  a  purpose  (or  “goal”  in  their  nomenclature)  given  a  POMDP  model  of  the  environment  [35]. 
Our  algorithm  for  auditing  is  similar  to  their  algorithm.  However,  whereas  their  algorithm  attempts  to 
determine  the  probability  that  a  sequence  of  actions  are  for  a  purpose,  we  are  concerned  with  whether  a 
use  of  information  could  be  for  a  purpose.  Thus,  we  must  first  develop  a  formalism  for  information  use. 
We  must  also  concern  ourselves  with  the  soundness  of  our  algorithm  rather  than  its  accuracy  in  terms  of 
a  predicted  probability.  Additionally,  we  use  traditional  POMDPs  to  model  purposes  that  are  never  fully 
satisfied  instead  of  the  goal  POMDPs  used  in  their  work. 

3  Modeling  Purpose-Driven  Agents 

We  review  the  Partially  Observable  Markov  Decision  Process  (POMDP)  model  and  then  show  how  to 
model  the  above  motivating  example  as  one.  We  start  with  an  agent,  such  as  a  person,  organization,  or 
artificially  intelligent  computer,  that  attempts  to  maximize  the  satisfaction  of  a  purpose.  The  agent  uses  a 
POMDP  to  plan  its  actions.  The  POMDP  models  the  agent’s  environment  and  how  its  actions  affects  the 
environment’s  state  and  the  satisfaction  of  the  purpose.  The  agent  selects  a  plan  that  optimizes  the  expected 
total  discounted  reward  (degree  of  purpose  satisfaction)  under  the  POMDP.  This  plan  corresponds  to  the 
program  running  the  audited  system. 

POMDPs.  To  define  POMDPs,  let  Dist(X)  denote  the  space  of  all  distributions  over  the  set  X  and  let  R. 
be  the  set  of  real  numbers.  A  POMDP  is  a  tuple  (Q,  A,  r,  p,  O,  v ,  7)  where 

•  Q  is  a  finite  state  space  representing  the  states  of  the  agent’s  environment; 

•  A,  a  finite  set  of  actions; 
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•  r  :  Q  x  A  — >  Dist(Q),  a  transition  function  from  a  state  and  an  action  to  a  distribution  over  states 
representing  the  possible  outcomes  of  the  action; 

•  p  ■  Qxd- »  R,  a  reward  function  measuring  the  immediate  impact  on  the  satisfaction  of  the  purpose 
when  the  agent  takes  the  given  action  in  the  given  state; 

•  O,  a  finite  observation  space  containing  any  observations  the  agent  may  perceive  while  performing 
actions; 

•  v  :  A  x  Q  — >  Dist(O),  a  distribution  over  observations  given  an  action  and  the  state  resulting  from 
performing  that  action;  and 

•  7,  a  discount  factor  such  that  0  <  7  <  1. 


We  say  that  a  POMDP  models  a  purpose  if  p  measures  the  degree  to  which  the  purpose  is  satisfied.  To 
select  actions  for  that  purpose,  the  agent  should  select  those  that  maximizes  its  expected  total  discounted 
reward,  E  [Ei*mT'l'Ui]  where  i  represents  time  and  it,,  the  reward  from  the  agent’s  ith.  action. 

This  goal  is  complicated  by  the  agent  not  knowing  a  priori  which  of  the  possible  states  of  the  POMDP 
is  the  current  state  of  its  environment.  Rather  it  holds  beliefs  about  which  state  is  the  current  state.  In 
particular,  the  agent  assigns  a  probability  to  each  state  q  according  to  how  likely  the  agent  believes  that  the 
current  state  is  the  state  q.  A  belief  state  (3  captures  these  beliefs  as  a  distribution  over  states  of  Q  (i.e., 
/?  £  Dist(Q)).  An  agent  updates  its  belief  state  as  it  performs  actions  and  makes  observations.  When  an 
agent  takes  the  action  a  and  makes  the  observation  o  starting  with  the  beliefs  /3,  the  agent  develops  the  new 
beliefs  /3'  where  f3'{q ')  is  the  probability  that  q’  is  the  next  state. 

We  define  up m(/?,  a,o)  to  equal  the  updated  beliefs  f3' .  f3'  assigns  to  the  state  q '  the  probability  f3'(q')  = 
Pr[Q'=g'|0=o,  A=a,  B=/3]  where  Q'  is  a  random  variable  over  next  states,  B=/3  identifies  the  agent’s  current 
belief  state  as  f3 ,  A=a  identifies  the  agent’s  current  action  as  a,  and  O  =0  identifies  the  observation  the  agent 
makes  while  performing  action  a  as  o.  We  may  reduce  upm(/3,  a,o)  to  the  following  formula  in  terms  of  the 
POMDP  model: 


uPm(A  a,  °)((?/) 


Kq,  g')(p)  Eg£Q  /3(g)  *  T(g>  a)W) 
Hq'eQ  9')(o)  E gee  /3(g)  *  T0b  a)<y) 


To  maximize  its  expected  total  discounted  reward,  the  agent  does  not  need  to  track  its  history  of  actions 
and  observations  independently  of  its  beliefs  as  such  beliefs  are  a  sufficient  statistic.  Thus,  the  agent  need 
only  consider  for  each  possible  belief  (3  it  can  have,  what  action  it  would  perform.  That  is,  the  agent  can 
plan  by  selecting  a  strategy:  a  function  from  the  space  of  beliefs  Dist(Q)  to  the  space  of  actions  A.  (We  use 
the  word  “strategy”  instead  of  the  more  common  “policy”  to  avoid  confusion  with  privacy  policies.) 

The  goal  of  the  agent  is  find  the  optimal  strategy.  By  the  Bellman  equation  [7],  the  expected  value  of  a 
belief  state  (3  under  a  strategy  a  is 


Vm(c r,/3)  =  Rm(/3,a(/?))  +  7^  Nm(/3,cr(/?))(o)  *  Vm(a,  upm(^,  a(j3),  o)) 

oEO 


(1) 


where  Rm  and  Nm  are  p  and  v  raised  to  work  over  beliefs:  Rm(/3,  a)  =  E9eQ  /3(g)  *  p(g,  a)  and  Nm(/3,  a)(o)  = 
Eg<?/eg/3(g)  *  r(q,a)(q')  *  u{a,q'){o).  A  strategy  a  is  optimal  if  it  maximizes  Vm  for  all  belief  states,  that 
is,  if  for  all  f3,  Vm(cr,  f3)  is  equal  to  V)£(/3)  =  ma xa>  Vm(a' ,(3).  Prior  work  has  provided  algorithms  for  finding 
optimal  strategies  by  reducing  the  problem  to  one  of  finding  an  optimal  strategy  for  a  related  Markov 
Decision  Process  (MDP)  that  uses  these  belief  states  as  its  state  space  (e.g.,  [40]).  (For  a  survey,  see  [27].) 


Example.  We  can  formalize  the  motivating  example  provided  in  Section  1  as  a  POMDP  mex.  Here, 
we  provide  an  overview  that  is  sufficient  for  understanding  the  rest  of  the  paper;  the  appendix  provides 
additional  details. 

For  simplicity,  we  assume  that  the  only  information  relevant  to  advertising  is  the  gender  of  the  visitor. 
Thus,  the  state  space  Q  is  determined  by  three  factors:  the  visitor’s  gender,  the  gender  (if  any)  recorded  in 
the  database,  and  what  advertisement  (if  any)  the  network  has  shown  to  the  visitor. 
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Also  for  simplicity,  we  assume  that  the  network  is  choosing  among  three  advertisements.  We  use  the 
action  space  A  =  {lookup,  adi,  acU,  ad3}.  The  actions  adi,  ad2,  and  ad3  correspond  to  the  network  showing 
the  visitor  one  of  the  three  possible  advertisements  while  lookup  corresponds  to  the  network  looking  up 
information  on  the  visitor.  We  presume  adx  is  the  best  for  females  and  the  worst  for  males,  ad3  is  the  best 
for  males  and  the  worst  for  females,  and  ad3  strikes  a  middle  ground.  In  particular,  we  use  p(q ,  adi)  =  9  for  a 
state  q  in  which  the  visitor  is  a  female  and  has  not  yet  seen  an  ad.  The  reward  9  could  refer  to  a  measure  of 
the  click  through  rate  or  the  average  preference  assigned  to  the  ad  by  females  during  market  research.  If  the 
visitor  were  instead  a  male,  the  reward  would  be  3.  For  ad3,  the  rewards  are  reversed  with  3  for  females  and 
9  for  males.  For  ad2,  the  reward  is  7  for  both  genders.  The  action  lookup  or  showing  a  second  ad  produces 
reward  of  zero.  We  use  a  discounting  factor  of  7  =  0.9. 

The  function  t  shows  how  actions  change  the  environment’s  state  while  v  shows  how  observations  ac¬ 
company  these  actions,  r  enforces  that  showing  an  ad  changes  the  state  into  one  in  which  showing  a  second 
ad  produces  no  further  rewards.  It  also  specifies  that  performing  lookup  does  not  change  the  state  of  the 
environment.  On  the  other  hand,  v  shows  that  lookup  can  change  the  state  of  the  agent’s  knowledge.  In 
particular,  it  shows  that  performing  lookup  produces  an  observation  (d,  a).  The  observation  reveals  that  the 
database  holds  data  d  about  the  visitor’s  gender  and  a  about  what  if  any  ad  the  visitor  has  seen.  Thus, 
the  observation  space  is  O  =  {f,  m,  _L}  x  {adi,  ad3,  ad3,  0}  with  f  for  the  database  showing  a  female,  m  for  a 
male,  _L  for  no  gender  entry,  ad;  for  the  visitor  having  seen  ad;,  and  0  for  the  visitor  having  not  seen  an  ad. 

How  the  network  will  behave  depends  upon  the  network’s  initial  beliefs  /3ex i-  We  presume  that  the 
network  believes  its  database’s  entries  to  be  correct,  that  it  has  not  shown  an  advertisement  to  the  visitor 
yet,  and  that  visitors  are  equally  likely  to  be  female  or  male.  Under  these  assumptions,  the  optimal  plan  for 
the  network  is  to  first  check  whether  the  database  contains  information  about  the  visitor.  If  the  database 
records  that  the  visitor  is  a  female,  then  the  network  shows  her  adi.  If  it  records  a  male,  the  network  shows 
ad3.  If  the  database  does  not  contain  the  visitor’s  gender  (holds  _L),  then  the  network  shows  ad2-  The 
optimal  plan  is  not  constrained  as  to  what  the  agent  does  after  showing  the  advertisement  as  it  does  not 
affect  the  reward.  (We  return  to  this  point  later  when  we  consider  non-redundancy  in  Section  5.) 

This  optimal  plan  characterizes  the  form  of  the  set  of  optimal  strategies.  The  set  contains  multiple  optimal 
strategies  since  the  network  is  unconstrained  in  the  actions  it  performs  after  showing  the  advertisement.  The 
optimal  strategies  must  also  specify  how  the  network  would  behave  under  other  possible  beliefs  it  could  have 
had.  For  example,  if  the  network  believed  that  all  visitors  are  females  regardless  of  what  its  database  records, 
then  it  would  always  show  adi  without  first  checking  its  database. 

Intuitively,  using  any  of  these  optimal  strategies  would  violate  the  privacy  policy  prohibiting  using  gender 
for  marketing.  The  reason  is  that  the  network  selected  which  advertisement  to  show  using  the  database’s 
information  about  the  visitor’s  gender. 

We  expect  the  network  constrained  to  obeying  the  policy  will  show  ad2  to  all  visitors  (presuming  ap¬ 
proximately  equal  numbers  of  female  and  male  visitors).  Our  reasoning  is  that  the  network  must  plan  as 
though  it  does  not  know  and  cannot  learn  the  visitor’s  gender.  In  this  state  of  simulated  ignorance,  the  best 
plan  the  network  can  select  is  the  middle  ground  of  ad2-  The  next  section  formalizes  this  planning  under 
simulated  ignorance. 


4  Constraining  POMDPs  for  Information  Use 

We  now  provide  a  formal  characterization  of  how  an  agent  pursuing  a  purpose  should  behave  when  prohibited 
from  using  a  class  of  information.  Recall  the  intuition  that  using  information  is  using  a  distinction  and  that 
not  using  it  corresponds  to  ignoring  the  distinction.  We  use  this  idea  to  model  sensitive  information  with 
an  equivalence  relation  =.  We  set  o3  =  02  for  any  two  observations  o3  and  02  that  differ  only  by  sensitive 
information. 

From  =  and  a  POMDP  m,  we  construct  a  POMDP  m/=  that  ignores  the  prohibited  information.  For  each 
equivalence  class  of  =,  m/=  will  conflate  its  members  by  treating  every  observation  in  it  as  indistinguishable 
from  one  another.  To  ignore  these  distinctions,  on  observing  o,  the  agent  updates  its  belief  state  as  though 
it  has  seen  some  element  of  =[o]  but  is  unsure  of  which  one  where  =[o]  is  the  equivalence  class  that  holds 
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the  observation  o. 

To  make  this  formal,  we  define  a  quotient  POMDP  m/=  that  uses  a  quotiented  space  of  observations.  Let 
0/=  be  the  set  of  equivalence  classes  of  O  under  =.  Let  u/=  give  the  probability  of  seeing  any  observation 
of  an  equivalence  class:  v /=(a,  q')(0)  =  EogO  u(a>  ^X0)  where  O  is  an  equivalence  class  in  0/=.  Given 
m  =  (<2,M,  T,p,0,v,  7),  let  m/=  be  (Q,  A,  r,  p,  0/=,  v/=,  7). 

Proposition  1.  For  all  POMDPs  m  and  equivalences  =,  m/=  is  a  POMDP. 

Proof.  We  prove  that  u/=  produces  probability  distributions  over  0/=.  For  all  a  and  q' , 

H  =  J2^(a,q'){o)  =  1 

OeO/=  OeO/=oGO  ogO 

follows  from  0/=  being  a  partition  of  O  and  from  v(a,q')  being  a  distribution  over  O.  For  all  O  G  0/=, 
0  <  u/=(a,q')(0)  <  1  since  u/=(a,q')(0)  =  EoeO  ^(a’ ^X0)  an(i  O  CO.  Thus,  u/=(a,q')  is  a  probability 
distribution.  □ 


Example.  Returning  to  the  example  POMDP  of  Section  3,  the  policy  governing  the  network  states  that 
the  network  will  not  use  the  database’s  entry  about  the  visitor’s  gender  for  determining  the  advertisement 
to  show  the  visitor.  The  auditor  must  decide  how  to  formally  model  this  restriction.  One  way  would  be  to 
define  =ex  such  that  for  all  g  and  g'  in  {f,  m,_L},  and  a  in  {adi,  ad2,  ad3,  0},  (<7,  a)  =ex  {g',a),  conflating 
the  gender  for  all  observations.  Under  this  requirement,  mex/=ex  will  be  such  that  the  optimal  strategy  will 
be  determined  solely  by  the  network’s  initial  beliefs  and  performing  the  action  lookup  will  be  of  no  benefit. 
Any  optimal  strategy  for  roex/=ex  will  call  for  performing  ad2  from  the  initial  beliefs  /3exi  discussed  above. 

Alternatively,  the  auditor  might  conclude  that  the  policy  only  forces  the  network  to  ignore  whether  the 
database  records  the  visitor  as  a  female  or  male  and  not  whether  the  database  contains  this  information. 
In  this  case,  the  auditor  would  use  a  different  equivalence  ='x  such  that  (f ,  a)  ='x  (m,a)  but  (f ,  a)  ^'x 
(_L,a)  ^'x  (m,a)  for  all  a.  Under  the  initial  beliefs  /3ex  1,  the  network  would  behave  identically  under  =ex 
and  ='x.  However,  if  the  network’s  beliefs  were  such  that  it  is  much  more  likely  to  not  know  a  female’s 
gender  than  a  male’s,  then  it  might  choose  to  show  adi  instead  of  ad2  in  the  case  of  observing  (_L,  0). 


The  next  proposition  proves  that  we  constructed  the  POMDP  m/=  so  that  beliefs  are  updated  as  if 
the  agent  only  learns  that  some  element  of  an  equivalence  class  of  observations  was  observed  but  not  which 
one.  That  is,  we  prove  that  the  updated  belief  up m/=(/3,  a,  =[o])(g')  is  equal  to  the  probability  that  the  next 
environmental  state  is  q'  given  the  distribution  /3  over  possible  last  states,  that  the  last  action  was  a,  and 
that  the  observation  was  a  member  of  =[o].  Recall  that  Q'  is  a  random  variable  over  the  next  state  while 
O,  A,  and  B  identify  the  last  observation,  action,  and  belief  state,  respectively. 

Proposition  2.  For  all  POMDPs  m,  equivalences  =,  beliefs  ft,  actions  a ,  observations  o,  and  states  q' , 
uPm/=(/3,  a,  =M )(<?')  =  Pr[Q'=<?'  |  O  G  =[o],  A=a,  B=0\. 

Proof.  For  all  to,  =,  ft,  a ,  o,  and  q', 


UP  m/=W^,=[o\)(q') 


y/={g, g')(=M)  Ege Q  P(q)  *  a){q’) 

Eq'eQ  v/={a,  q'){=[o\)  PWt  *  r(q,  a){q') 

E0lg=[0]  d'){oi )  Egg q  /3(g)  *  r(ff>  a)W) 

E0lg=[0]  E9'gQ  v{a,  q')(oi)  EggQ  P{q)  *  r(q,  a)(q') 
Pr[0  G  =[o]\Q'=q',  A=a,  B=/3\  Pr[Q'=g'|A=a,  B=/3\ 
Pr[0  G  =[o]|A=a,  B=ft\ 

Pr[Q'=g'  |  O  G  =[o],  A=a,  B=/3] 


since  Pr[0  G  =[o\\Q'=q',  A=a,  B=ft\  =  Pr[0  G  =[o\\Q'=q',  A=a\. 


□ 
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Propositions  1  and  2  show  that  m/=  is  a  POMDP  that  ignores  the  distinctions  among  observations  that 
only  differ  by  sensitive  information.  They  justify  the  following  definition,  which  explains  how  a  purpose- 
driven  agent  should  act  when  prohibited  from  using  certain  information.  They  show  that  it  correctly  prevents 
the  use  of  the  prohibited  information.  The  definition’s  appeal  to  optimizing  a  POMDP  is  justified  by  our 
prior  work  showing  that  an  action  is  for  a  purpose  when  that  action  is  selected  as  part  of  a  plan  optimizing 
the  satisfaction  of  that  purpose  [42],  We  extend  this  result  to  information  by  concluding  that  information 
used  to  select  an  action  is  used  for  that  action’s  purpose. 

Definition  1  (Cognitive).  An  agent  obeys  the  purpose  restriction  to  perform  actions  for  the  purpose  modeled 
by  the  POMDP  m  without  using  the  information  modeled  by  =  iff  the  agent  selects  an  strategy  by  optimizing 
m/=. 

We  call  the  above  definition  cognitive  since  it  refers  to  the  strategy  selected  by  the  agent  as  part  of  a 
cognitive  process  that  the  auditor  cannot  measure.  Rather,  the  auditor  can  only  view  the  agent’s  external 
behavior  and  visible  aspects  of  the  environment.  That  is,  the  auditor  can  only  view  the  agent’s  actions  and 
observations,  which  we  refer  to  collectively  as  the  agent’s  execution. 

We  can  formalize  the  agent’s  execution  using  a  function  exe.  Even  when  the  agent  uses  the  POMDP 
m/=  with  observation  space  Of=  to  select  a  strategy,  the  actual  observations  the  agent  makes  lie  in  0, 
complicating  exe.  We  recursively  define  exe(m,  =,  <r,  o)  to  be  the  agent’s  execution  that  arises  from  it 
employing  a  strategy  a  observing  a  sequence  of  observations  o  —  [oi, . . .  on]  in  0*  starting  with  beliefs 
for  a  POMDP  m/=.  For  the  empty  sequence  []  of  observations,  exe(m,  =,  a,  /3,  [])  =  [cr(/3)]  since  the  agent 
can  only  make  one  action  before  needing  to  wait  for  the  next  observation  and  updating  its  beliefs.  For  non¬ 
empty  sequences  o:o,  it  is  equal  to  tr(/3):o:exe(m,  =,  cr,  up m/=(/3,  cr(/3),  =[o]),  o)  where  x:y  denotes  prepending 
element  x  to  the  sequence  y. 

A  single  execution  e  can  be  consistent  with  both  an  optimal  strategy  for  m/=  and  a  strategy  that  is  not 
optimal  for  m/=.  Consider  for  example,  the  execution  e  =  [adz]  =  exe(TOex,  =ex,  cr,  (3ex,  [])  that  arises  from 
an  optimal  strategy  cr  for  mex/=ex.  This  execution  can  also  arise  from  the  agent  planning  for  a  different 
purpose,  such  as  maximizing  kickbacks  for  showing  certain  ads,  provided  that  ad2  also  just  so  happens  to 
maximize  that  purpose.  Since  the  auditor  only  observes  the  execution  e  and  not  the  cognitive  process  that 
selected  the  action  ad2,  the  auditor  cannot  know  by  which  process  the  agent  selected  the  ad.  Thus,  the 
auditor  cannot  determine  from  an  execution  that  an  agent  obeyed  a  purpose  restriction  under  Definition  1. 

Some  auditors  may  find  this  fundamental  limitation  immaterial  since  such  an  agent’s  actions  are  still 
consistent  with  an  allowed  strategy.  Since  the  actual  reasons  behind  the  agent  selecting  those  actions  do 
not  affect  the  environment,  an  auditor  might  not  find  concerning  an  agent  doing  the  right  actions  for  the 
wrong  reasons.  To  capture  this  more  consequentialist  view  of  compliance,  we  provide  a  weaker  definition 
that  focuses  on  only  the  agent’s  execution. 

Definition  2  (Behaviorist).  An  agent  performing  execution  e  obeys  the  purpose  restriction  to  perform 
actions  for  the  purpose  modeled  by  the  POMDP  m  and  initial  beliefs  without  using  the  information 
modeled  by  the  equivalence  relation  =  given  the  observations  o  iff  e  =  exe(m,  =,  <7, /3i,  o)  for  some  a  that  is 
an  optimal  strategy  of  m/=. 


5  Auditing  Algorithm 

Under  the  behaviorist  definition,  to  determine  whether  an  agent  obeyed  a  prohibition  against  using  certain 
information  for  a  purpose  pursued  by  the  agent,  the  auditor  can  compare  the  agent’s  behaviors  to  the 
appropriate  strategies.  The  auditor  records  the  agent’s  execution  in  a  log  t  that  shows  the  actions  and 
observations  of  the  agent.  For  example,  databases  for  electronic  medical  records  log  many  of  the  actions  and 
observations  of  healthcare  providers.  The  auditor  may  then  compare  the  recorded  behavior  to  that  dictated 
by  Definition  2,  i.e.,  to  the  optimal  strategies  for  the  quotient  POMDP  modeling  the  purpose  while  ignoring 
disallowed  information. 

Given  our  formal  model,  we  can  automate  the  comparison  of  the  agent’s  behavior  to  the  allowable 
behavior.  We  use  an  algorithm  Audit  that  takes  as  inputs  a  POMDP  m,  an  equivalence  relation  =,  and 


a  log  £  =  [ai,  oi,  02,  02,  ■  •  • ,  an,  On]  such  that  the  audited  agent  is  operating  in  the  environment  m  under  a 
policy  prohibiting  information  as  described  by  =  and  took  action  at  followed  by  observation  Oi  for  all  i  <  n. 
For  simplicity,  we  assume  that  £  records  all  relevant  actions  and  observations.  Audit  returns  whether  the 
agent’s  behavior,  as  recorded  in  £,  is  inconsistent  with  optimizing  the  POMDP  m/=. 

Audit  operates  by  first  constructing  the  quotient  POMDP  m/=  from  m  and  =.  Next,  similar  to  a  prior 
algorithm  [35],  for  each  i.  Audit  checks  whether  performing  the  recorded  action  a*  in  the  current  belief  state 
Pi  is  optimal  under  m/=.  The  algorithm  constructs  these  belief  states  from  the  observations  and  initial  belief 
state  Pi.  Due  to  the  complexity  of  solving  POMDPs  [31],  we  use  an  approximation  algorithm  to  solve  for 
the  value  of  performing  a*  in  Pi  (denoted  Qm/=(A,  ai))  and  the  optimal  value  V*n/={Pi)-  Unlike  prior  work, 
for  soundness,  we  require  an  approximation  algorithm  SOLVePOMDP  that  produces  both  lower  bounds  V*ow 
and  upper  bounds  V*p  on  V^,=  (/3j).  Many  such  algorithms  exist  (e.g.,  [49,  39,  20,  33]).  For  each  pi  and  a; 
in  £,  Audit  checks  whether  these  bounds  show  that  Q*m/=(Pi,ai)  is  strictly  less  than  V)U_(/3j).  If  so,  then 
the  action  ai  is  sub-optimal  for  Pi  and  Audit  returns  true.  Pseudo-code  for  Audit  follows: 

Audit((Q,  A,r,p,  0,v,y),=,p1,  [ai,oi,a2,o2,..  .  ,an,on]): 

01  m'  =  {Q,A,T,p,0/=,u/=,  7) 

02  <vrow,  VJp)  :=  SOLVePOMDP(to') 

03  for  (i  :=  1;  i  <  n;  i++): 

04  if(Q*p(V*p,/3i,ai)<Vrow(A)): 

05  return  true 

06  pi+ 1  :=  upm/=(/3j,  a,,  =[oj]); 

07  return  false 

where  Q*p(V*p, /3,  a)  is  a  function  that  uses  V*p  to  return  an  upper  bound  on  Q)n;=(/3,  a): 

Qup(VuP>^>«)  =Rm(/3,a)+7  Nrn{P,a))(0)*Tup{upm,(p,a(P)70)) 

ogo/= 

Theorem  1  (Soundness).  //Audit  returns  true,  then  the  agent  did  not  follow  an  optimal  strategy  for  m /= , 
violating  both  Definitions  1  and  2. 

Proof.  If  the  algorithm  returns  true,  then  for  some  i,  Q*m/-(Pi,ai )  <  Q*p(V*p,  Pi,  af)  <  V*ow(/3i)  <  V^/_(ft). 
This  implies  that  ai  is  suboptimal  at  belief  state  Pi  and  the  agent  did  not  follow  an  optimal  strategy  for  the 
allowed  purpose  using  only  the  allowed  information.  □ 

Thus,  if  Audit  returns  true,  either  the  agent  optimized  some  other  purpose,  used  information  it  should 
not  have,  used  a  different  POMDP  model  of  its  environment,  or  failed  to  correctly  optimize  the  POMDP. 
Each  of  these  possibilities  should  concern  the  auditor  and  is  worthy  of  further  investigation. 

If  the  algorithm  returns  false,  then  the  auditor  cannot  find  the  agent’s  behavior  inconsistent  with  an 
optimal  strategy  and  should  spend  his  time  auditing  other  agents.  However,  Audit  is  incomplete  and 
such  a  finding  does  not  mean  that  the  agent  surely  performed  its  actions  for  the  purpose  without  using 
the  prohibited  information.  For  the  cognitive  definition,  incompleteness  is  unavoidable  since  the  definition 
depends  upon  cognitive  constructs  that  the  auditor  cannot  measure.  For  example,  recall  that  the  network 
could  display  the  execution  e  =  [ad2]  either  from  performing  the  allowed  optimization  or  by  performing  some 
disallowed  optimization  that  also  results  in  the  action  ad2  being  optimal. 

For  the  behaviorist  definition,  incompleteness  results  since  a  better  approximation  might  actually  show 
that  Q*rl/=(pi,  ai)  <  Vf^,-(Pi)  for  some  i.  In  principle  this  source  is  avoidable  by  using  an  exact  POMDP 
solver  instead  of  an  approximate  one.  However,  the  exact  solution  to  some  POMDPs  is  undecidable  [21]. 
Nevertheless,  we  can  prove  that  this  inability  is  the  only  source  of  incompleteness. 

Theorem  2  (Qualified  Completeness).  //Audit  using  an  oracle  to  exactly  solve  POMDPs  returns  false,  then 
the  agent  obeyed  the  purpose  restriction  according  to  the  behaviorist  definition  (Definition  2). 

Proof.  Assume  that  algorithm  returns  false.  Then,  for  every  i,  it  must  be  the  case  that  QuP(V*p,  Pi,  ai)  ft 
vfow (ft)-  Since  an  oracle  returns  exact  results  for  V*p  and  V*ow,  Q*m/-(Pi,az)  =  Q(]p(V*p,  pi,  af)  and  V| *ow(/3j)  = 
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V^j/-(/3j).  Thus,  for  all  i,  Qm/=(Piiai)  >  /=(&).  Thus  for  all  i,  a,  is  optimal  at  belief  state  Pi  and  the 

agent’s  are  consistent  with  following  an  optimal  strategy  for  m/=.  □ 

Other  Purpose  Restrictions.  Audit  is  specialized  for  determining  whether  or  not  the  audited  agent 
performed  its  actions  for  a  purpose  without  using  some  prohibited  information.  While  such  a  question 
is  relevant  to  an  internal  compliance  officer  auditing  employees,  it  does  not  correspond  to  the  purpose 
restrictions  found  in  outward-facing  privacy  policies. 

One  type  of  restriction  found  in  such  policies  is  the  not-for  restriction  prohibiting  information  from 
being  used  for  a  purpose.  For  example,  Yahoo!  promised  to  not  use  contents  of  emails  for  marketing.  This 
restriction  is  similar  to  the  condition  checked  by  Audit,  but  is  weaker  in  that  audited  agent  may  obey  it 
either  (1)  by  performing  actions  for  that  purpose  without  using  that  information  (which  Audit  checks)  or 
(2)  by  not  performing  actions  for  that  purpose. 

A  second  type  is  the  only-for  restriction,  which  limits  the  agent  to  using  a  class  of  information  only  for 
a  purpose.  For  example,  HIPAA  requires  that  medical  records  are  used  only  for  certain  purposes  such  as 
treatment.  It  is  also  weak  in  that  the  agent  can  obey  it  either  (1)  by  performing  actions  for  the  purpose 
(which  Audit  checks  using  equality  for  =  to  allow  the  agent  to  use  the  information)  or  (2)  by  not  using  the 
information  in  question  while  performing  actions  for  some  other  purpose. 

For  both  of  these  types,  our  algorithm  can  handle  the  first  option  (1)  for  compliance.  However,  for 
both  these  types,  the  second  option  (2)  for  compliance  involves  an  open-ended  space  of  possible  alternative 
purposes  that  could  have  motivated  the  agent’s  actions.  In  some  cases  (e.g.,  healthcare),  this  space  may  be 
small  enough  to  check  each  alternative  (e.g.,  treatment,  billing,  research,  training)  with  Audit.  In  other 
cases,  the  auditor  might  have  the  authority  to  compel  the  agent  to  explain  what  its  purpose  was.  In  either 
of  these  cases,  the  auditor  could  use  Audit  to  explore  these  alternative  purposes. 

Modeling.  Audit  requires  a  POMDP  that  models  how  various  actions  affect  the  purpose  in  question.  In 
some  cases,  acquiring  such  a  model  may  be  non-trivial.  We  hope  that  future  work  can  ease  the  process  of 
model  construction  using  techniques  from  reinforcement  learning,  such  as  SARSA  [36],  that  automatically 
construct  models  from  observing  the  behavior  of  multiple  agents. 

In  some  cases,  the  auditor  might  be  able  to  compel  the  agent  to  provide  the  POMDP  used.  In  this  case, 
Audit  would  check  whether  the  agent’s  story  is  consistent  with  its  actions. 

Non-Redundancy.  In  our  running  example,  the  actions  of  the  agent  after  showing  the  advertisement  are 
unconstrained.  The  reason  is  that  showing  the  advertisement  will  result  in  the  current  state  of  the  POMDP 
becoming  one  from  which  no  further  rewards  are  possible.  Since  the  only  criterion  of  an  optimal  strategy 
is  its  expected  total  discounted  reward,  a  strategy  may  assign  any  action  to  these  states  without  changing 
whether  it  is  optimal.  However,  none  of  the  actions  in  A  actually  improves  the  satisfaction  of  the  purpose. 
Thus,  intuitively,  the  agent  should  just  stop  instead  of  performing  any  of  them. 

Prior  work  has  formalized  this  intuition  for  MDPs  using  the  idea  of  non-redundancy  [42] .  We  may  apply 
the  same  idea  to  POMDPs.  We  add  to  each  POMDP  a  distinguished  action  stop  that  indicates  that  the 
agent  stops  and  does  nothing  more  (for  the  purpose  in  question).  The  stop  action  always  produces  zero 
reward  and  results  in  no  state  change:  p(q,  stop)  =  0  and  r(q,  stop)  =  S(q)  for  all  q  in  Q.  An  action  a  other 
than  stop  from  a  belief  state  p  is  redundant  if  it  is  no  better  than  stopping:  a )  <  Q*m  ,_(/?,  stop)  =  0. 

A  strategy  is  non-redundant  if  it  never  requires  a  redundant  action  from  any  belief  state.  We  require  that 
the  strategy  that  the  agent  selects  is  not  just  optimal  for  the  total  expected  discounted  reward,  but  also  that 
it  is  non-redundant. 

We  modify  Audit  to  enforce  this  requirement  by  additionally  checking  that  Q *p(/3j,aj)  >  0  for  each  pair 
of  a  belief  state  pi  and  an  action  cu  other  than  stop  in  the  log  l.  If  not,  Audit  has  found  a  redundant  action 
ai  indicating  a  violation  and  returns  true. 
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6  Relationship  with  Noninterference 

We  have  provided  a  definition  of  information  use  in  terms  of  a  POMDP.  Prior  work  provides  the  noninter¬ 
ference  definition  of  information  use  for  automata  [15].  In  this  section,  we  show  that  our  definition  implies 
a  form  of  noninterference.  In  particular,  we  show  that  agents  using  strategies  optimizing  m/=  has  noninter¬ 
ference  for  =,  which  suggests  that  our  definition  is  sufficiently  strong  to  rule  out  information  use.  We  start 
by  reviewing  automata  and  noninterference. 

Automaton  Model  of  Systems.  The  agent  using  the  POMDP  to  select  a  strategy  can  implement  that 
strategy  as  a  control  system  or  controller  (e.g.,  [19]).  We  follow  Goguen  and  Meseguer’s  work  and  model 
systems  as  deterministic  automata  [15].  However,  since  we  do  not  analyze  the  internal  structure  of  systems 
(it  is  unavailable  to  the  auditor),  our  approach  can  be  applied  to  other  models.  We  limit  our  discussion  to 
deterministic  systems  since  there  are  many  competing  generalizations  of  noninterference  to  the  nondeter- 
ministic  setting  (e.g.,  [25,  46,  26]),  but  the  main  competitors  collapse  into  standard  noninterference  in  the 
deterministic  case  [11]. 

A  system  automaton  s  =  (t,  r)  consists  of  a  labeled  transition  system  (LTS)  t  and  a  current  state  r. 
An  LTS  t  =  (TZ,  O,  A,  next,  act)  describes  the  automaton’s  behavior  where  TZ  is  a  set  of  states;  O ,  a  set 
of  observations  (inputs);  A,  a  set  of  actions  (outputs);  next  :  1Z  x  O  — >  TZ  is  a  transition  function;  and 
act  :  1Z  A  is  a  function  identifying  the  action  that  the  automation  selects  given  its  current  state.  The 
current  state  r  £  1Z  changes  as  the  system  makes  observations  and  takes  actions. 

As  with  POMDPs,  an  execution  of  a  system  s  modeled  as  an  automaton  corresponds  to  an  interleaving  of 
observations  from  the  environment  and  actions  taken  by  the  system.  Let  exe(s,  6)  denote  the  execution  of  s  on 
a  sequence  o  of  observations.  As  for  POMDPs,  we  define  exe  for  systems  recursively:  exe((f,  r),  [])  =  [act(r)] 
and  exe((f,  r),o:o)  =  act (r):o:exe((r,  next(r,  o)),o)  where  t  =  (7 Z,  O,  A ,  next,  act). 

Noninterference.  Recall  that  we  set  o\  =  02  for  any  two  observations  o\  and  02  that  differ  only  by  sensitive 
information.  To  not  use  the  sensitive  information,  the  system  s  should  treat  such  related  observations 
identically. 

To  formalize  this  notion,  we  raise  =  to  work  over  sequences  of  observations  and  actions  (i.e. ,  executions 
and  sequences  of  observations).  For  such  sequences  x  and  y  in  (O  U  A)*,  x  =  y  iff  they  are  of  the  same 
length  and  for  each  pair  of  elements  x  and  y  at  the  same  position  in  x  and  y ,  respectively,  x  =  y  where  =  is 
treated  as  equality  when  comparing  actions. 

Definition  3.  A  system  s  has  noninterference  for  =  iff  for  all  observation  sequences  o\  and  02  in  O* , 
01  =  02  implies  that  exe(s,oi)  =  exe(s,  02). 

Our  definition  corresponds  to  the  form  of  noninterference  enforced  by  most  type  systems  for  information 
flow.  (See  [37]  for  a  survey.)  Unlike  Goguen  and  Meseguer’s  definition,  ours  does  not  require  the  system’s 
behavior  to  remain  unchanged  regardless  of  whether  or  not  it  receives  sensitive  information.  Rather,  the 
system’s  behavior  may  change  upon  receiving  sensitive  information,  but  this  change  must  be  the  same 
regardless  of  the  value  of  the  sensitive  information.  (See  [43]  for  a  discussion.) 

Relationship.  We  now  characterize  the  relationship  between  our  quotienting  definition  of  information  use 
and  noninterference.  We  do  so  by  considering  a  control  system  s  operating  in  an  environment  modeled  by 
a  POMDP  m.  We  require  that  s  and  m  share  the  same  sets  of  actions  A  and  observations  O.  However, 
the  state  spaces  TZ  of  s  and  Q  of  m  differ  with  TZ  representing  the  internal  states  of  the  system  and  Q 
representing  the  external  states  of  the  environment. 

We  relate  systems  and  strategies  by  saying  that  a  system  s  implements  a  strategy  a  for  m/=  and  beliefs 
/3i  iff  for  all  o  in  O* ,  exe(s,  o)  =  exe(m,  =,  tr,  /3i,  o).  We  denote  the  set  of  such  implementing  systems  as 
lmp(m,  =,  cr,  /3i).  This  definition  allows  us  to  formalize  the  intuition  that  agents  using  strategies  optimizing 
to/=  has  noninterference  for  =.  In  fact,  systems  implementing  any  strategy  for  m/=  has  noninterference 
since  any  such  implementation  respects  =. 
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Theorem  3.  For  all  systems  q,  POMDPs  m,  initial  beliefs  j3\,  strategies  a,  and  equivalences  =,  if  s  is  in 
lmp(m,  =,  cr,  /3\),  then  s  has  noninterference  for  = . 

Proof.  Assume  that  the  system  s  is  in  Imp (m,  =,  a ,  /3).  Then  for  any  observation  o,  exe(s,  o)  =  exe(m,  =,  cr,  /?,  o). 

Suppose  that  o\  =  02-  Since  01  =  02,  |  o*i  |  =  |o2 1  ■  We  can  prove  by  induction  over  this  length  that 
exe(m,  =,  a,  j3,  Or)  =  exe(m,  =,  cr,  /3, 02): 

•  Base  Case:  d)  =  []  and  02  =  [].  The  result  follows  immediately  since  d)  =02. 

•  Inductive  Case:  di  =  0\  :d\  and  02  =  02:0^.  Since  0\  =  02,  0^  =  0^  and  01  =  02.  For  some  /3', 
upm/=(/3,  cr(/3),  =[01])  =  /?'  =  upm^=(/3,  cr(/3),  =[o2])  since  01  =  o2.  By  the  inductive  hypothesis  on  d^ 
and  o^,  exe(m,  =,  cr,  /3',  o^)  =  exe(m,  =,  cr,  /3',  o^).  Thus, 

exe(m,  =,  cr,  /3 ,  oiioj.)  =  a(/3):oi:exe(m ,  =,  cr,  /?',  d^)  =  cr(/3):o2:exe(m,  =,  cr,  /?' ,  o^)  =  exe(m,  =,  cr,  /3 ,  02:02) 


Since  exe(m,  =, cr, /3, d))  =  exe(m,  =, cr, /3, 02),  exe(s, o'!)  =  exe(s, 02).  □ 

Agents  obeying  a  purpose  restriction  under  the  cognitive  definition  (Definition  1)  will  employ  a  system 
in  lmp(m,  =, cr, /3i).  Thus,  Theorem  3  shows  that  the  cognitive  definition  is  sufficiently  strong  to  rule  out 
information  use. 

Information  Use  for  Other  Purposes.  The  situation  is  subtler  for  the  weaker  behaviorist  definition 
(Definition  2)  and  the  algorithm  Audit  based  upon  it.  Systems  exist  that  will  pass  Audit  and  satisfy  the 
behaviorist  definition  despite  having  interference  by  using  the  protected  information  for  some  purpose  other 
than  the  restricted  one.  The  key  is  that  there  could  be  more  than  one  optimal  strategy  for  a  POMDP  and 
that  the  agent  may  use  the  choice  among  optimal  strategies  to  communicate  information.  The  behavior 
of  such  a  system  will  be  consistent  with  whichever  optimal  strategy  it  selects,  satisfying  the  behaviorist 
definition  and  Audit.  However,  such  a  system  will  not  actually  implement  any  strategy  for  the  quoticnted 
POMDP  m/=  since  it  distinguishes  between  observations  conflated  by  =. 

For  example,  consider  modifying  the  motivating  example  found  in  Section  3  in  two  ways  to  make  the 
POMDP  m'x.  First,  let  ad2  come  in  two  versions,  ad^  and  adj,  which  are  otherwise  the  same  as  the  original 
ad2-  Second,  change  the  POMDP  so  that  the  network  must  perform  the  action  lookup  before  showing  any  ads. 
Two  optimal  non-redundant  strategies  will  exist  for  m'ex/=.  Starting  from  the  initial  beliefs  /3exi  discussed 
above,  in  one  of  the  strategies,  <ja,  the  network  will  first  perform  lookup  and  then  show  ad^.  Under  the 
second,  ab ,  it  will  show  adj  after  lookup.  Under  both,  it  then  switches  to  the  action  stop. 

The  network’s  ability  to  choose  between  aa  and  ab  can  result  in  interference.  In  particular,  the  network 
might  not  implement  either  of  them  and  instead  delay  the  choice  between  adj  and  ad^  until  after  the 
observation  from  lookup  informs  it  of  the  visitor’s  gender.  The  network  could  then  use  adj  for  a  female 
and  ado  for  a  male.  While  such  a  system  would  use  the  information  and  have  interference,  it  obeys  the 
behaviorist  definition  with  its  actions  consistent  with  either  aa  in  the  case  of  a  female  or  ab  in  the  case  of  a 
male. 

Since  such  systems  use  the  prohibited  information  to  choose  between  optimal  strategies,  doing  so  does 
not  actually  increase  its  satisfaction  of  the  purpose.  Thus,  this  information  use  is  not  intuitively  for  that 
purpose.  The  agent  must  be  motivated  by  some  other  purpose  such  as  exfiltrating  protected  information 
to  a  third-party  that  can  see  which  ad  the  network  selects  but  the  not  visitor’s  gender  directly.  Thus,  the 
behaviorist  definition  does  not  allow  the  agent  to  use  the  information  for  the  purpose  prohibited  by  the 
restriction,  but  rather  allows  the  agent  to  use  the  information  for  some  other  purpose. 

The  auditor  might  want  to  prevent  such  interference  since  it  violates  the  cognitive  definition.  The 
modifications  to  the  example  illustrate  two  ways  that  the  auditor  can  do  so  if  he  has  sufficient  control  over 
the  agent’s  environment.  The  first  is  to  ensure  that  only  a  single  strategy  is  optimal  and  non-redundant.  The 
second  is  to  make  sure  that  the  agent  can  avoid  learning  the  protected  information  (such  as  by  performing 
the  action  lookup)  and  that  learning  it  incurs  a  cost.  When  learning  information  is  optional  and  costly,  the 
agent  will  only  be  able  to  learn  it  if  doing  so  increases  its  total  reward,  and  not  just  to  select  among  optimal 
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strategies  that  do  not  depend  upon  using  that  information.  A  third  possible  modification  is  to  require  the 
agent  to  perform  an  action  committing  it  to  a  single  strategy  before  it  can  learn  the  protected  information. 

In  some  cases  an  auditor  can  detect  such  information  flows  without  modifying  the  POMDP.  For  example, 
intuitively,  we  would  expect  the  ad  network  to  handle  more  than  one  visitor.  The  auditor  could  compare 
the  network’s  behavior  when  given  a  female  to  that  when  given  a  male.  A  difference  in  treatment  indicates 
that  the  network  is  not  consistently  implementing  either  of  the  optimal  strategies. 

7  Conclusion 

We  use  planning  to  create  the  first  formal  semantics  for  determining  when  information  is  used  for  a  purpose. 
We  have  provided  an  auditing  algorithm  based  on  our  formalism.  We  have  discussed  applying  our  algorithm 
to  the  problem  of  enforcing  purpose  restrictions  found  in  privacy  policies. 

Our  methods  have  applications  beyond  enforcing  purpose  restrictions.  For  example,  due  to  privacy 
concerns,  much  interest  exists  in  determining  how  third-party  data  collection  agencies  use  the  information 
they  collect.  (See  [23]  for  a  survey.)  Despite  being  a  question  of  information  flow,  program  analyses  are 
inapplicable  since  the  programs  are  unavailable,  as  in  our  setting.  Unlike  our  setting,  these  agencies  typically 
do  not  subject  themselves  to  purpose  restrictions.  Nevertheless,  their  desire  for  profit  implicitly  restrains 
their  behavior  in  a  manner  similar  to  a  purpose  restriction.  Thus,  our  semantics  and  algorithm  provide  a 
starting  point  for  investigating  such  agencies. 
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Appendix:  Details  of  Example  POMDP 

Here  we  provide  details  about  the  network  POMDP  mex.  Formally,  the  state  space  is  Q  =  {f,  m}  x  {f,  m,  J_}  x 
{adi,  ad2,  ad3,  0}  with  f,  m,  _L,  adj,  and  0  interpreted  as  in  Section  3.  For  example,  the  state  (f,  _L,ad2) 
indicates  that  the  visitor  is  a  female,  the  database  does  not  record  her  gender,  and  the  network  has  shown 
her  ad2.  p(( f,  _L,  ad2))  =  0  since  the  visitor  has  already  seen  an  ad. 

The  actions  and  states  are  related  by  the  transition  function  r  :  Qx  A  — »  Dist(Q).  r(q,  a)  is  a  distribution 
over  states  such  that  for  each  state  q' ,  T(q,a)(qr)  is  the  probability  of  the  environment  transition  from  state 
q  to  state  q'  by  the  network  performing  action  a.  While  the  network  has  uncertainty  about  the  gender 
of  the  visitor,  each  action  selected  by  network  deterministically  results  in  the  next  state.  Thus,  in  this 
model,  for  all  states  q  and  actions  a,  the  distribution  r(g,a)  is  always  a  degenerate  distribution  that  assigns 
a  probability  of  1  to  exactly  one  state.  Let  S(q)  denote  the  degenerate  distribution  assigning  the  probability 
of  1  to  the  state  q.  In  our  model,  r((g,  d,  0),  adj)  =  6((g,  d1  adj))  for  all  g  in  {f,  m},  d  in  {f,  m,J_},  and  i 
in  {1,2,3}  reflecting  that  showing  an  advertisement  does  not  change  the  visitor’s  gender  or  the  network’s 
database.  r((g,  d,  adj),  ad.,)  =  S((g,  d,  adj))  since  the  network  can  show  the  visitor  only  one  advertisement. 
r{q,  lookup)  =  S(q)  since  looking  up  information  in  the  database  does  not  change  the  state  of  the  environment. 

The  function  v  :  4xQ->  Dist(O)  relates  these  observations  to  actions  and  states.  Again  we  restrict  our 
attention  to  degenerate  distributions  since  our  example  contains  uncertainty  but  not  truly  random  processes. 
For  each  state  ( g ,  d ,  a),  the  lookup  action  results  in  the  observation  (d,  a).  For  simplicity,  we  model  actions  of 
showing  an  advertisement  as  providing  a  similar  observation.  Thus,  for  all  actions  a,  i/(a,  ( g ,  d ,  a))  =  S((d,  a)). 
(Since  the  network  only  gets  to  show  one  advertisement  and  following  actions  do  not  affect  its  total  reward, 
the  observation  made  from  showing  an  advertisement  is  of  no  consequence.) 
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